SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning
نویسندگان
چکیده
Systems for declarative large-scale machine learning (ML) algorithms aim at high-level algorithm specification and automatic optimization of runtime execution plans. State-ofthe-art compilers rely on algebraic rewrites and operator selection, including fused operators to avoid materialized intermediates, reduce memory bandwidth requirements, and exploit sparsity across chains of operations. However, the unlimited number of relevant patterns for rewrites and operators poses challenges in terms of development effort and high performance impact. Query compilation has been studied extensively in the database literature, but ML programs additionally require handling linear algebra and exploiting algebraic properties, DAG structures, and sparsity. In this paper, we introduce Spoof, an architecture to automatically (1) identify algebraic simplification rewrites, and (2) generate fused operators in a holistic framework. We describe a snapshot of the overall system, including key techniques of sum-product optimization and code generation. Preliminary experiments show performance close to hand-coded fused operators, significant improvements over a baseline without fused operators, and moderate compilation overhead.
منابع مشابه
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML
Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by means of linear algebra programs, and then automatically generate efficient execution plans. In this context, optimization opportunities for fused operators—in terms of fused chains of basic operators—are ubiquitous. These opportunities include (1) fewer materialized intermediates, (2) fewer scans of input d...
متن کاملTwo-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect
This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...
متن کاملA Mathematical Programming Model and Genetic Algorithm for a Multi-Product Single Machine Scheduling Problem with Rework Processes
In this paper, a multi-product single machine scheduling problem with the possibility of producing defected jobs, is considered. We concern rework in the scheduling environment and propose a mixed-integer programming (MIP) model for the problem. Based on the philosophy of just-in-time production, minimization of the sum of earliness and tardiness costs is taken into account as the objective fu...
متن کاملExtended and infinite ordered weighted averaging and sum operators with numerical examples
This study discusses some variants of Ordered WeightedAveraging (OWA) operators and related information aggregation methods. Indetail, we define the Extended Ordered Weighted Sum (EOWS) operator and theExtended Ordered Weighted Averaging (EOWA) operator, which are applied inscientometrics evaluation where the preference is over finitely manyrepresentative works. As...
متن کاملComparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کامل